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The problem we concentrate on is as follows: given (1) a convex 
compact set X in R", an afline mapping x i— ► A(x), a parametric 
family {p M (-)} of probability densities and (2) N i.i.d. observations of 
the random variable u, distributed with the density PaIx) (') f° r some 
(unknown) x 6 X, estimate the value g T x of a given linear form at x. 

For several families {Pn(-)} with no additional assumptions on 
X and A, we develop computationally efficient estimation routines 
which are minimax optimal, within an absolute constant factor. We 
then apply these routines to recovering x itself in the Euclidean norm. 

1. Introduction. The problem we are interested in is essentially as fol- 
lows: suppose that we are given a convex compact set X in M. n , an affine 
mapping x i— > A(x) and a parametric family of probability densities. 

Suppose that N i.i.d. observations of the random variable u, distributed with 
the density Pa(x){') f° r some (unknown) x E X , are available. Our objective 
is to estimate the value g T x of a given linear form at x. 

In nonpar ametric statistics, there exists an immense literature on various 
versions of this problem (see, e.g., [10, 11, 12, 13, 15, 17, 18, 21, 22, 23, 24, 
25, 26, 27, 28] and the references therein). To the best of our knowledge, 
the majority of papers on the subject focus on specific domains X (e.g., 
distributions with densities from Sobolev balls), and investigate lower and 
upper bounds on the worst-case, with regard to x £ X, accuracy to which 
the problem of interest can be solved. These bounds depend on the number 
of observations N, and the question of primary interest is the behavior of 
those bounds as N — > oo. When the lower and the upper bounds coincide 
within a constant factor [or, ideally, within factor (1 + o(l)) as N — > oo], 
the estimation problem is considered essentially solved, and the estimation 
methods underlying the upper bounds are treated as optimal. 
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The approach we adopt in this paper is of a different spirit; we make no 
"structural assumptions" on X, aside from assumptions of convexity and 
compactness which are crucial for us, and we make no assumptions on the 
linear functional p. Clearly, with no structural assumptions on X and p, 
explicit bounds on the risks of our estimates, as well as bounds on the mini- 
max optimal risk, are impossible. However, it is possible to show that when 
estimating linear forms, the worst-case risk of the estimator we propose is 
within an absolute constant factor of the "ideal" (i.e., the minimax optimal) 
risk. It should be added that while the optimal, within an absolute constant 
factor, worst-case risk of our estimates is not available in a closed analyt- 
ical form, it is "available algorithmically" — it can be efficiently computed, 
provided that X is computationally tractable. 1 

Note that the estimation problem, presented above, can be seen as a 
generalization of the problem of estimation of linear functionals of the central 
parameter of a normal distribution (see [4, 8, 9, 16]). Namely, suppose that 
the observation uj G R m , 

uj = Ax + (j£ 

of the unknown signal x is available. Here A is a given m x n matrix and 
£ ~ j\f(0,I m ), a > is known. For this important case the problem has been 
essentially solved in [5] , where it was proved that for several commonly used 
loss functions, the minimax optimal affine in uj estimate is minimax optimal, 
within an absolute constant factor, among all possible estimates. 

Another special case of our setting is the problem of estimating a linear 
functional g{p) of an unknown distribution p, given N i.i.d. observations 
Ui,...,u>n, which obey p. We suppose that it is known a priori that p £ X, 
where X is a given convex compact set of distributions (here the parameter 
x is the density p itself). Some important results for this problem have been 
obtained in [6] and [7]. For instance, in [7] the authors established minimax 
bounds for the risk of estimation of g(p) and developed an estimation method 
based on the binary search algorithm. The estimation procedure uses at each 
search iteration tests of convex hypotheses, studied in [2, 3]. That estimator 
of g(p) is shown to be minimax optimal (within an absolute constant factor) 
if some basic structural assumptions about X hold. 

In this paper, we concentrate on the properties of affine estimators. Here, 
we refer to an estimator g as affine when it is of the form g(u)\, . . . , ujn) = 
J3i=i ^(^i)) f° r some given functions eft, that is, if g is an affine function of 
the empirical distribution. When <f> itself is an affine function, the estimator 



For details on computational tractability and complexity issues in convex optimization, 
see, for example, [1], Chapter 4. A reader not familiar with this area will not lose much 
when interpreting a computationally tractable convex set as a set given by a finite system 
of inequalities Pi(x) < 0, i = 1, . . . ,m, where Pi(x) are convex polynomials. 
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is also affine in the observations, as it is in the setting of [5]. Our motivation 
is to extend the results obtained in [5] to the non-Gaussian situation. In 
particular, we propose a technique of derivation of affine estimators which 
are minimax optimal (up to a moderate absolute constant) for a class of 
"good parametric families of distributions," which is defined in Section 2.1. 
As normal family and discrete distributions belong to the class of good para- 
metric families, the minimax optimal estimators for these cases are obtained 
by direct application of the general construction. In this sense, our results 
generalize those of [7] and [5] on the estimation of linear functionals. On 
the other hand, it is clear that different techniques, presented in the current 
paper, inherit from those developed in [3] and [7]. To make a computation- 
ally efficient solution of the estimation problem possible, unlike the authors 
of those papers, we concentrate only on the finite-dimensional situation. As 
a result, the proposed estimation procedures allow efficient numeric imple- 
mentation. This also allows us to avoid much of the intricate mathematical 
details. However, we allow the dimension to be arbitrarily large, thus ad- 
dressing, essentially, a nonparametric estimation problem. 

The rest of this paper is organized as follows. In Section 2, we define the 
main components of our study — we state the estimation problem and define 
the corresponding risk measures. Then, in Section 3, we provide the general 
solution to the estimation problem, which is then applied, in Section 4, to 
the problems of estimating linear functionals in the normal model and the 
tomography model. Finally, in Section 5, we present adaptive versions of 
affine estimators. 

Note that when passing from recovering linear forms of the unknown 
signal to recovering the signal itself, we do impose structural assumptions 
on X, but still make no structural assumptions on the affine mapping A{x). 
Our "optimality results" become weaker — instead of "optimality within an 
absolute constant factor" we end up with statements like "the worst-case 
risk of such-and-such estimate is in between the minimax optimal risk and 
the latter risk to the power x" with x depending on the geometry of X 
(and close to 1 when this geometry is "good enough"). 

2. Problem statement. 

2.1. Good parametric families of distributions. Let (£l,P) be a Polish 
space with Borel cr-finite measure, and Ai C W 71 . Assume that every [i G M. 
is associated with a probability density Pu(w) — a Borel nonnegative function 
on such that fQp^(uj)P(duj) = 1; we refer to the mapping jjl — > Pu(-) as to a 
parametric density family T>. Let also T be a finite-dimensional linear space 
of Borel functions on Q which contains constants. We call a pair (D,^) good 
if it possesses the following properties: 
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1. M is an open convex set in M. m ; 

2. whenever fj, G M, we have Ph(uj) > everywhere on CI; 

3. whenever fi,u £ A4, we have 4>(oj) = \n(p^(uj) /p u (u>)) G J^"; 

4. whenever </>(u;) G J 7 , the function 

is well defined and concave in /i G M . 

The reader familiar with exponential families will immediately recognize 
that the above definition implies that T> is such a family. Let us denote 
Ph{<jo) = exp{#(/i) T u; — C(9(fi))}, fi G M, its density with regard to P where 
6 is the natural parameter and C(-) as the cumulant function. Then, T> is 
good if: 

1. M is an open convex set in V P = {/i£ R m \ J e e( -^ TuJ P(duj) < oo}; 

2. for any <j) such that the cumulant function C(9(fj.) + </>) is well definded, 
the function [C(9(fi) + </>) — C(0 (//))] is concave in /i£ Ai. 

Let us list several examples. 



Example 1 (Discrete distributions). Let CI = {1, 2, . . . , M} be a finite 
set, P be a counting measure on Cl, M. = {fi G 1R A/ : /x > 0, J^i^i = 1} an d 
Pfj,(i) = Hi, i = 1, . . . ,M. Let also T be the set of all functions on Cl. The 
associated pair (V, F) clearly is good. 

Example 2 (Poisson distributions). Let Cl = {0, 1, ...}, P be the count- 
ing measure on Cl, M. = {/U G K : /i > 0} and = M m j"^ , z G 0, so that 

is the Poisson distribution with the parameter [i. Let also T be the set of 
affine functions <j)(i) = ai + (3 on Cl. We claim that the associated pair (T),^) 
is good. Indeed, ln(p^(i)/p u (i)) = i[ln/i — \ni>] + \i — v is an affine function 
of i, and 

ln^^ expjai + /?} — ^ — —J = ln(exp{/3 — /x} exp{/zexp{a}}) 

= (3 — fx + H exp{a} 

is a concave function of /x > 0. 

Example 3 (Gaussian distributions with fixed covariance). Let = M fc , 
P be the Lebesque measure on CI, £ be a positive definite k x k matrix, 
M = R k and 

p M (w) = (2vr)- fc / 2 (Det £)" 1/2 exp{-(w - ^) T S" 1 (w - /x)} 
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be the density of the Gaussian distribution with mean \i and covariance 
matrix S. Let, further, T be comprised of affine functions on f2. We claim 
that the associated pair {V,!F) is good. Indeed, the function ln(p^(u>) /p u (u;)) 
indeed is affine on f2, and 



^4> 



ln^y exp{(j) T uj + c\p fjl {ijj) duj^j = c + 4> T \x + 
is a concave function of 

Example 4 {Direct product of good pairs). Let p^{uit) be a probability 
density, parameterized by fig £ Mi C M m< , on a Polish space fli with Borel 
(T-finite measure P#, and Ti be a finite-dimensional linear space of Borel 
functions on 0>i such that the associated pairs {JD^Ti) are good. Let us 
define the direct product (T>,J-) =®^=i(f^,^) of these pairs as follows: 

• The associated space with measure is (fl = fii x • ■ • x O^, P = P\ x • ■ • x 

• The set of parameters is M = Mi x ••• xMt, and the density associ- 
ated with a parameter /i = (/ii, . . . from this set is p^{oji, ■ ■ . ,u>l) = 

uLip £ ,M)- 

• T is comprised of all functions 0(o>i, . . . , wl) = X)l=i ^eif^i) with <fo(-) G 
Pi, £ = l,...,m. 

We claim that i/ie direct product of good pairs is good. Indeed, M is an open 
convex set; when fj, = (jUi , . . . , hl) and v = (u%, . . . , vi) are in M , we have 



ln(p^(wi, . . .,ul)/Pv(wi, ■ ■ =5^1n(pJ < (w<)/j^ < (w<)) € J 7 

and when (j)(u>i, . . . , wx,) = 4>i{^e) £ we have 
Ia^exp{^(w)}p M (o;)P(cL;)^ ^(jl^ exp{^(^)}pJ £ (w^)P(dw/)j 

which is a sum of concave functions of and thus is concave in /i. 
2.2. T/ie problem. The problem we are interested in is as follows: 

Problem I. We are given the following: 

• a convex compact set X C M. n , 

• a good pair (V^J 7 ) comprised of 
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- a parametric family {p^(uj) : a G A4 C M m } of probability densities on a 
Borel space f2 with u-finite Borel measure P and 

- a finite-dimensional linear space T of Borel functions on f2, 

• an affine mapping x \— ► : X i— > .M, 

• a linear form on M n D X. 

Aside of this a priori information, we are given a realization u of a random 
variable taking values in £1 and distributed with the density Pa(x)(') f° r some 
unknown in advance x G X. Our goal is to infer from this observation an 
estimate g(uj) of the value g T x of the given linear form at x. 

From now on we refer to an estimate as affine, if it is of the form g(u>) = 
4>(uj), with certain <f> G T . 

We quantify the risk of a candidate estimate g(-) by its worst-case, over 
x£l, confidence interval, given the confidence level. Specifically, given a 
confidence level e G (0,1), we define the associated e-risk of an estimate g 
as 

Risk(<?;e) = ini\ <5 : sup Prob^ p r.){u :\g(u) - g T x\ > 5} < e 

V x£X 

The corresponding minimax optimal e-risk is defined as 

Risk* (e) = inf Risk(#; e) , 
<?(•) 

where inf is taken over the space of all Borel functions g on fL We are 
interested also in the minimax optimal e-risk of affine estimates 

RiskA(e) = inf Risk(^;e). 

0(-)6JF 



3. Minimax optimal affine estimators. 

3.1. Main result. Our main result follows. 



Theorem 3.1. Let the pair (D,^) underlying Problem I be good. Then, 
the minimax optimal risk achievable with affine estimates is, for small e, 
within an absolute constant factor of the "true" minimax optimal risk, specif- 
ically, 

0<e<l/4 RiskA(e) < 0(e) Risk* (e), 0( £ )= 2l ™f/ 6 } 

ln(l/(4e)) 

PROOF. For r > 0, let us set 
$ r (x, y; (j>, a) = g T x - g T y + a]n yj Q exp{a _1 <j>(u)}p A ( y ) (u)P(duj)j 
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+ 2ar: 2xf + ^l, 
Z = X xX, 
T+=Tx {a>0}. 

We claim that this function is a continuous real- valued function on Z x 
which is convex in (cp, a) £ !F + and concave in (x, y) £ Z. 

Indeed, the function 

9( f j,,v;4>) = \n(^J exp{4>(cj)} Pll (ui)P(dcj] 

+ ln^y exp{-(j)(uj)}p„(uj)P(duj)y. {M x M) x T -> R 

is well defined, concave in (/i, v) £ Mx. M [since (T>, !F) is good] and convex in 
4> £ T (evident). Since M is open and T is a finite-dimensional linear space, 
^ is continuous on its domain. It remains to note that $ £ is the sum of a 
linear function of x,y,a and the function a^>(A(x), A(y);a~ 1 (f>) which clearly 
is concave in (x,y) [since ^(n, v\<$) is concave in (/x, v) and ^4(-) is affine] 
and convex in (4>,a) £ [since ^{yL,v;cj>) is continuous in <f> £ T, and the 
transformation f(u) i— > g(u,a) — af(u/a) converts a convex function of u into 
a convex in (a > 0,«) function of (it, a)]. 

Since Z is a convex finite-dimensional compact set, is a convex finite- 
dimensional set and $ e is continuous and convex-concave on Z x we 
can invoke the Sion-Kakutani theorem (see, e.g., [14]) to infer that 

(3.1) sup inf <& r (x, y; 0, a) = inf max <3? r (x, y; <j), a) := 2$„ I (r). 

Note that 3>*(r) > is a concave and nonnegative function of r > 0. Indeed, 
the functional /a;[/i] = In J n exp{h(u:)}pA( x } (u)P(du) is well defined and con- 
vex on J 7 , whence 

$ r (x, x; (j), a) = 2ar + a(f x [-a~ 1 4>] + f x [a~ 1 <f)]) > 2ar > 0, 

whence <£*(r) > \ sup^g^ inf^gjr Q> o & r (x, x; 4>, a) > 0. The concavity of$„,(r) 
on the nonnegative ray follows immediately from the representation, yielded 
by (3.1), 



2 <t>&r,a 



2ar + sup &o(x, y; <fi, a) 

x,y£X 



of <&*(r) as the infinum of a family of affine functions of r. 
Lemma 3.1. One has 



RiskA(e) < $*(ln(2/e)). 
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Proof. Given 5 > and e € (0,1/4), let us build an afnne estimate 
with e-risk <R = $*(ln(2/e)) +5/2, namely, as follows. By (3.1), there 
exist 0* £ T and a* > 0, such that 

2$,(ln(2/e))+<5/2 

> max $ £ / 2 (x,y;0*,a*) 

x,y£X 



max 



g T x + a* m (^ ex P{-a* 1 (f)*(uj)}p A ^(uj)P(duj)j + a* ln(2/e) 



V 



x max 

ydX 



-g T y + a^ln(^J^ex.p{a„ 1 (j) 1 ,(u})}p A ^(u})P(du;)j + a*ln(2/e) 



v 



Setting c ; 



u-v 



, we have 



max 

xex 



g 1 'x + a:*mf^exp{-a:* + c}}p A ^ x - ) (io)P{dwyj +a*ln(2/e) 



U-c = — ±X < $*(ln(2/e)) + 5/4 = R - 5/4, 



max 

y& 



g T x + a*ln^exp{a* 1 [<P*(uj) + c]}p A ( y )(uj)P(duj)J + a*m(2/£ 

= V + c = < **(In(2/e)) + 5/4 = R - 5/4 

or, equivalently, 

maxlnQ^expjaJ^a; - ((/>*(u) + c) - R]}p A( ^(u)P(dL>)j 

< m(£ /2)-A = m( £ 72) 5 
4a* 

maxln('^exp{a- 1 [(^(a;) +c) - fl-/y]}PA(j,)(w)P(du;)) < ln(e'/2), 
that is, 

(a) VxGl: f expK" 1 ^ - (0*(w) + c) - i2]}p A(a:) (a;)P((L;) < e'/2, 

(b) VyGX: J^exp{a:\[Mu) + c}-R-g T y]}p A{y) (u)P(du)<e'/2. 

For a given the exponent in (a) is nonnegative and is > 1, for all ui 

such that g T x— [4>*(uj) + c] > R; therefore, (a) implies that Yxoh u} ^ PA ^j.^{g T x > 
[(p*(uj) + c] + R} < e' /2, for every x £ X . By similar reasons, (b) implies that 
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Prob^p Q {g T x < [(/>#(&) + c] — R} < e'/2, for all x € X. Since by con- 
struction e' < e, we see that the e-risk of the affine estimate g(u>) = (ft* (uj) + c 
is < R, as claimed. □ 



Lemma 3.2. One has 
(3.2) 5 €(0,1) 

whence also 



(3.3) 



s G (0, 1/4) 



Risk,(5 2 /4) >$*(ln(l/<5)), 



Risk»((J) > 1 ^ 1 ^ 4g ^ ,(ln(2/e)). 



21n(2/e) 



Proof. To prove (3.2), let us set p = ln(l/5). The function V p (x,y) = 
inf^, g jF ]Q> o & P (x, y; (ft, a) takes values in {— oo}UlR, is upper semicontinuous 
(since & r is continuous) and is not identically — oo (in fact, it is even > 
when y = x). Thus, *$> p achieves its maximum on X x X at certain point 
(x, y), and for any (a > 0, (ft € J 7 ): 

(3.4) $ p (x,y;<ft,a) > V p (x,y) = sup inf $ p (x,y,(ft,a) = 2$*(p), 

where the concluding inequality is given by (3.1). Since (T>, J 7 ) is a good pair, 
setting n = A(x), v = A(y) and </>(w) = ^ln(p /J (o;)/p I/ (a;)), we get (ft € J 7 , 
which combines with (3.4) to imply that 

V(a>0): 

2$* (jo) < g T x -g T y + a In Qf expj-a" 1 [a^(w)]}p Al (u;)P(dw 

+ ]n(J^exp{a- 1 [a$(u))]}p„(u)P(du>) 

= 9 T x - g T y + 2a In (J^ y / p At (w)^(w)P(dw)^ + p 
The resulting inequality holds true for all a > 0, meaning that 

(3.5) 



(a) g T x-g T y>2$4p)=2$*(]n(l/6)), 

(b) ^ yJp p (uj)p u (uj)P(duj) > exp{-p} = 5. 



Now assume, in contrast to what should be proved, that Risk„(<5 2 /4) < 
$*(ln(l/5)). Then, there exists R' < $*(ln(l/<5)), 5' < 5 2 /A and an estimate 
g{uj) such that 



Prob,, 



{ .){\g(uj) -g T x\ >R'} <6' Vi6l. 
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Now, consider two hypotheses Hi^ on the distribution of u stating that the 
densities of the distribution with regard to P are p^ and p u , respectively. 
Consider a procedure for distinguishing between the hypotheses as follows: 
after u is observed, we compare g(u) with g = \[g T x + g T y]; if g(u>) > g, 
we accept IIi, otherwise we accept n 2 . Note that by (3.5) (a) and due to 
R' < <fr*(ln(l/#)), the probability to accept H2 when IIi is true is < the 
probability for g{oo) to deviate from g T x by at most R', that is, it is < 6'. 
Similarly, the probability to accept IIi when 1I2 is true is < 5' . Now, let Q± 
be the part of f2 where our hypotheses testing routine accepts IIi, so that 
in 0,2 = SI \ Vt\ the routine accepts H2. As we just have seen, 



p v {uj)P(dw) < d', / Pix (uj)P((Lj)<6', 



whence 



/ \Jp^(u)p v (uj)P{duj) = ^2 \lp^)Pu{^) p i du 
< 2^b< < 2^^2/4 = 5. 



The resulting inequality J n J p^(uj)p u (ijj)P(duj) <5 contradicts (3.5)(b); we 
have arrived at a desired contradiction. (3.2) is proved. 

To prove (3.3), let us set 5 = 2-y/i, so that Risk*(e) = Risk*(5 2 /4) > 
$*(ln(l/(5)) = <£*(iln(^)), where the concluding > is due to (3.2). Now 
recall that 3>*(r) is a nonnegative and concave function of r > 0, so that 
<3?*(tr) > i$*(r), for all r > and < t < 1. We therefore have 



2 \AeJ J ~ 21n(2/e) V \e 
and we arrive at (3.3). □ 

Lemmas 3.1 and 3.2 clearly imply Theorem 3.1. □ 

Remark 3.1. Lemmas 3.1 and 3.2 provide certain information even 
beyond the case when the pair {T>,T) is good, specifically, that: 

(i) The e-risk of an affine estimate can be made arbitrarily close to the 
quantity 



-0) = in f „ SU P ^\n(2/e)(x,y;(p,a) 
</>eT,a>o XjyeX 



(cf. Lemma 3.1); 
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(ii) We have Risk*(e) > $_(e) = sup X)y£X mi ()>e jr ja>0 ^ l/2ln{1/{A£)) (x,y; 
<j),a) (cf. Lemma 3.2). 

As it is seen from the proofs of Lemmas 3.1 and 3.2, both these statements 
hold true without the goodness assumption. The role of the latter is in 
ensuring that is within an absolute constant factor of &-(e). 

Lemma 3.2 Implies the following result. 

Proposition 3.1. Under the premise of Theorem 3.1, the Hettinger 
affinity 



is a continuous and log-concave function on M. x Ai, and the quantity ^ > *(r), 
r > 0, admits the following representation: 

(3.6) 2$*(r) =max{/x- g T y : ASR(A(x) , A(y)) > exp{-r}, x, y G X}. 

We see that the upper bound <]?*(ln(2/£)) on RiskAff(e) stated in The- 
orem 3.1 admits a very transparent interpretation: this bound is the max- 
imum of the variation ^ msc^ x ^ y \g T x — g T y] of the estimated functional on 
the set of pairs x,y £ X with the associated distributions "close" to each 
other, namely, such that AffH(A(x) , A(y)) >e/2. Observe that asymptoti- 
cally (when r becomes small), 2 <3?*(r) is equivalent to the modulus of conti- 
nuity uj(r,X) of g with regard to the Hettinger distance, introduced in [7]. 

Proof of Proposition 3.1. By exactly the same argument as in the 
proof of Theorem 3.1, the function ^(/x, z/; tfi) : (M x M) x T — > R, 



is well defined and continuous on its domain, and this function is convex in 
<f> and concave in (fJ.,v). We claim that 



which would imply that ln(AffH(-)) is indeed a finite concave function on 
M. x A4 and as such is continuous (recall that M is open). To justify our 




V(lJL,u;<t>)= In / e^{-4>{uj)}p^uj)P{du) 





(3.7) 



ln(AffH(/i, v)) = \ min ^(/i, v; tp) 



2 Recall that we consider here the case of one observation. 
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claim, note that, for fixed /j,i/gJH, setting = \ ln(p^/p^), we get a func- 
tion from T such that ^/(fi, v] 4>) = 21n(AffH(/z, u)). To complete the verifica- 
tion of (3.7), it suffices to demonstrate that ^(/i, v\ (ft) > v\ <j)) whenever 
4> £ J 7 , which is immediate, since setting <j) = (f> + A, we have 

exp{*(/i,z/;0)/2} 

[(p / ,( W )^(a;)) 1 /4 eX p { _A(^)/2}] 
x [(p^KM) 1 / 4 exp{A(u;)/2}]P(<L;) 

1/2 



< 



p^(uj)p u (uj) exp{-A(cu)}P(dcv) 



jl/2 

/ p^(uj)p u (uj) exp{A(w)}P((iw) 

= exp{*(/x,z;;0)/2}. 
Now, note that by (3.1) 

2$*(r) = sup \ inf [# T x - + a$?(A(x), A(y); a _1 0) + 2ar] I 



sup < g T x — g T y + inf a 
sup ^ g T x — g T y + inf a 



inf ^>(A(x),A(y);a 1 (j))+2r 



inf *( J 4(x),^4(y);V') + 2r 



x,y£X 

sup (# T x- g T y + inf a[21n(AfiH(A(x), A(y))) + 2r] 

x,y£X 



a>0 



JO, 



ln(Afffl(A(a;), Aft/))) + r > 0, 
ln(Afffl(A(x), A(i/))) + r < 



max{g x — (7 y : AffH(A(x), A(y)) > exp{— r}, x,y £ X} 



see (3.7)] 
□ 



3.2. The case of multiple observations. In Problem I, our goal was to 
estimate g T x from a single observation u> of the random variable lo ~ pa(x) (•) > 
associated with x. The result can be immediately extended to the case when 
we want to recover g T x from a sample of independent observations u>i,. . . ,u>l 
of random variables lvi with distributions parameterized by x. Specifically, 
let (f2^,P^) and (T>£,J-£), 1 < £ < L, be as in Example 4, and let every pair 
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(T>£,J-£) be good. Assume, further, that X C R n is a convex compact set and 
Ag(x) are affine mappings with Ag(X) C Me. Given a linear form g T z on 
R n and a sequence of independent realizations loi ~ y At ( x )(')i &= 1, • • • ,L, 
we want to recover from these observations the value g T x of the given affine 
form at the "signal" x underlying our observations. 

In our current situation, we call a candidate estimate g(ui, . . . ,ujl) affine 
if it is of the form 



(3.8) g(u} 1 ,...,u L )=^2> 



where <j)£ £ Ti, I = 1, . . . ,L. Note that setting (V^J 7 ) = ®i = \(^Pt^i)i we 
reduce the situation to the one we have already considered. In particular, 
Theorem 3.1 along with the proof of Lemma 3.1 implies the following result 
(where the e-risks — of an estimate, the minimax optimal and the affine- 
minimax optimal — are defined exactly as in the single-observation case). 

Theorem 3.2. In the situation just described, for r > 0, let 

L 



J^lnf / exp{-a 1 (f> t (u t )}p Ai; ^(u; t )P(du; t 

+ (^J^ exp{a" 1 ^(^)}PA f (y)(^)- P (^)) 

+ g T x - g T y + 2ar: Z x T + -> R, 
2 = 1x1, 
J~+ = T\ x • • • x J-jj x {q > 0}. 

The function $ r is continuous on its domain, concave in the (x, y)- argument, 
convex in the ((/>, a) -argument and possesses a well-defined saddle point value 

2<&„,(r) = sup inf $> r (x,y;(j),a) = inf sup $ r (x,y;4>,a), 



*r( x <y) $r(<f>,a) 

which is a concave and nonnegative function of r > 0. Moreover: 
(i) For all eS (0,1/4), we have 

21n(2/e) 



RiskA(e) < $„(ln(2/e)) < 0(e) Risk*(e), 9(e) 



ln(l/(4 £ ))- 
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(ii) Given e G (0, 1/4) and 8 > 0, in order to build an affine estimate with 
e-risk not exceeding [$*(ln(2/e)) + 6], where 5 > is given, it suffices to find 
a* > and 4>\ G J^, 1 < £ < L, suc/i i/iai 

$in(2/ £ )(<f ,«*) < 2**(ln(2/e)) + *A 
to compute the quantity 

g T x + a*J2ln( f exp{-a~ 1 4>* e (ujg)}p e Mx) (uj e )Pg{dug) 



c = — max 

2 xex 



1 

— max 

2 y&x 



-g T y + «*Xjln^£ exp{a 



and to sei 
(3.9) 



g(u!, . . . ,u L ) = ^24>}(ug) + c. 



Remark 3.2. Computing the "nearly optimal" affine estimate (3.9) re- 
duces to convex programming and thus can be carried out efficiently, pro- 
vided that we are given explicit descriptions of: 

• the linear spaces Tg, I = 1, . . . , L (as it is the case, e.g., in Examples 1-3), 

• and X (e.g., by a list of efficiently computable convex constraints which 
cut X out of M n ) and are capable to compute efficiently the value of $. r 
at a point. 



Remark 3.3. Assume that the observations ug, £q < £ < £\, are copies 
of the same random variable [i.e., fig, Pg,T>g,J-g,Ag(-) are independent of I 
for £q < i < £i\. Then, the convex function 3v(<^i, . . . ,cpL,a) is symmetric 
with regard to the arguments <f)g G !Fg , £$ < £ < £\, and therefore, when 
building the estimate (3.9) we lose nothing when restricting ourselves to 
satisfying <j)g = (pg a , £q < £ < £\ , which allows to reduce the computational 
effort of building a* , 0| . 

3.2.1. Illustration. Consider the toy problem where we want to recover 
the probability p of getting 1 from a Bernoulli distribution, given L inde- 
pendent realizations . . . ,u>l of the associated random variable. To handle 
the problem, we specialize our general setup as follows: 

• (fig, Pg), 1 < £ < L, are identical to the two-point set {0; 1} with the count- 
ing measure; 

• M. is the interval (0, 1), and p M (l) = 1 — p^(0) = p, /x G A4; 

• X is a compact convex subset in M., say, the segment [Te-16, l-l/e-16], 
and A(x) = x. 
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Table 1 

Recovering the parameter of a Bernoulli distribution 



e 


L 


7 




6 




Upper risk 
bound 


Lower risk 
bound 


Ratio of 
bounds 


He) 


0.05 


10 


2.91e- 


1 


4.18e- 


2 


3.61e-l 


2.49e-l 


1.45 


4.58 


0.05 


100 


4.13e- 


2 


9.17e- 


3 


1.33e-l 


8.19e-2 


1.63 


4.58 


0.05 


1000 


4.29e- 


3 


9.91e- 


4 


4.29e-3 


2.60e-3 


1.65 


4.58 


0.01 


10 


3.58e- 


1 


2.83e- 


2 


4.04e-l 


3.29e-l 


1.23 


3.29 


0.01 


100 


5.83e- 


2 


8.84e- 


2 


1.59e-l 


1.15e-l 


1.38 


3.29 


0.01 


1000 


6.15e- 


3 


9.88e- 


4 


5.13c 2 


3.67e-3 


1.40 


3.29 


0.001 


10 


4.19e- 


1 


1.61e- 


2 


4.42e-l 


3.98e-l 


1.11 


2.75 


0.001 


100 


8.15e- 


2 


8.37e- 


3 


1.88e-l 


1.51e-l 


1.24 


2.75 


0.001 


1000 


8.79e- 


3 


9.82e- 


4 


6.14e-3 


4.88e-3 


1.26 


2.75 



Invoking Remark 3.3, we lose nothing when restricting ourselves to affine 
estimates of the form (3.8) with mutually identical functions <f>e(-), l<i <L, 
that is, with the estimates 

L 

g(ui,...,uj L ) = 7 + 5^W£. 

i=i 

Invoking Theorem 3.2, the coefficients 7 and 5 are readily given by the <f>- 
component of the saddle point (max in x,y G X, min in cp = [^>o;^i] G ^ 2 
and a > 0) of the convex-concave function 

x - y + a[L ln(e^°/ a (l - x) + e^ l/a x) 

+ Lln(e^/ a (l - y) +e^ /a y) + 21n(2/e)]; 

the (guaranteed upper bound on the) e-risk of this estimate is half of the cor- 
responding saddle point value. The saddle point (it is easily seen that it does 
exist) can be computed with high accuracy by standard convex programming 
techniques. In Table 1, we present the nearly optimal affine estimates along 
with the corresponding risks. In the table, the upper risk bound is the one 
guaranteed by Theorem 3.2 and the lower risk bound is the largest d such 
that the hypotheses "p = 0.5 + d" and "p = 0.5 — d" cannot be distinguished 
from L independent observations of a random variable ~ Bernoulli (p) with 
the sum of probabilities of errors < 2e [this easily computable quantity is a 
lower bound on the minimax optimal e-risk Risk*(e)], and #(e) = 
is the theoretical upper bound on the "level of nonoptimality" of our esti- 
mate. As it could be guessed in advance, for large L, the near-optimal affine 
estimate is close to the trivial estimate J2e=i ^i- 



4. Applications. In this section, we present some applications of Theo- 
rems 3.1 and 3.2. 
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4.1. Positron emission tomography. The positron emission tomography 
(PET) is a noninvasive diagnostic tool allowing us to visualize not only 
the anatomy of tissues in a body, but their functioning as well. In PET, 
a patient is administered a radioactive tracer chosen in such a way that it 
concentrates in the areas of interest (e.g., those of high metabolic activity in 
early diagnosis of cancer). The tracer disintegrates, emitting positrons which 
then annihilate with nearby electrons to produce pairs of photons flying at 
the speed of light in opposite directions; the orientation of the resulting line 
of response (LOR) is completely random. The patient is placed in a cylinder 
with the surface split into small detector cells. When two of the detectors 
are hit by photons "nearly simultaneously" — within an appropriately chosen 
short time window — the event indicates that somewhere at a line crossing 
the detectors a disintegration act took place. Such an event is registered, and 
the data collected by the PET device form a list of the number of events 
registered in every one of the bins (pairs of detectors) in the course of a given 
time t. The goal of a PET reconstruction algorithm is to recover the density 
of the tracer from this data. The standard mathematical model of PET is as 
follows. After discretization of the field of view, there are iV voxels (small 3D 
cubes) assigned with nonnegative (and unknown) amounts X{ of the traces 
i = 1, . . . , n. The number of LORs emanating from a voxel % is a realization 
of a Poisson random variable with parameter Xi, and these variables for 
different voxels are independent. Every LOR emanating from a voxel i is 
subject to a "lottery," which decides in which bin (pair of detectors) it will 
be registered or if it will be registered at all — some LORs can intersect the 
surface of the cylinder only in one point or not intersect it at all and thus 
are missed. The role of the lottery is played by the random orientation of 
the LOR in question, and outcomes of different lotteries are independent. 
The probabilities qa for a LOR emanating from voxel i to be registered in 
bin £ are known (they are readily given by the geometry of the device). 
With this model, the data registered by PET is a realization of a random 
vector (loi, . . . ,ul) (L is the total number of bins) with independent Poisson- 
distributed coordinates, the parameter of the Poisson distribution associated 
with u>£ being 



Assume that our a priori information on x allows us to point out a convex 
compact set X C {x 6 W n : x > 0}, such that x G X. Assuming without loss 
of generality that J2i lie > f° r every t (indeed, we can eliminate all bins 
i which never register LORs) and invoking Example 2, we find ourselves 
in the situation of Section 3.2. It follows that in order to evaluate a given 
linear form g T x of the unknown tracer density x, we can use the construction 



n 




i=l 
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from Theorem 3.2 to build a near-optimal affine estimate of g T x. The recipe 
suggested to this end by Theorem 3.2 reads as follows: the estimate is of the 
form 

L 



where m is the number of LORs registered in bin I and 7* = [7J; . . . ; 7^], c* 
are given by an optimal solution (7* , a*) to the convex optimization problem 

min $ r (7>a)> 

o>0,7 



*r(7>«) = ma x< g x-g y 

x,y£X 



+ a 



U=l 



^[g £ (x)exp{-Q 1 7^} + %(y)exp{a V}] 



r = ln(2/e), qe(z) = Y^Qie z i, g(z) = ^^(z). 

i=l 1=1 
It is easily seen that the problem is solvable with 

L 

-<l( x ) + ^2Qe(x) exp{-a7 1 7|} 



1 



f T 

max< q x + a* 



£=l 



max< -5 y + a* 



-9(2/) + 51 ^(y) ex P{ a * S£ } 



4.2. Gaussian observations. Now consider the standard problem of re- 
covering a linear form g T x of a signal x known to belong to a given convex 
compact set X C W 1 via indirect observations of the signal corrupted by 
Gaussian noise. Without loss of generality, let the model of observations be 

(4.1) uj = Ax + Z, £~Af(0,I L ). 

The associated pair (T>,J-) is comprised of the shifts of the standard Gaus- 
sian distribution (T>) and all afhne forms on M. L (J-) and is good (see Example 
3). The affine estimates in the case in question are just the affine functions 
of u. The near-optimality of affine estimates in the case in question was es- 
tablished by Donoho [5] , not only for the e-risk, but for all risks based on the 
standard loss functions. We have the following direct corollary of Theorem 
3.2 (cf. Theorem 2 and Corollary 1 of [5]): 
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Proposition 4.1. In the situation in question, the affine estimate g £ (-) 
yielded by Theorem 3.2 is asymptotically (e — > +0) optimal, specifically, 

ee (0,1/2) Risk(# £ ;e) < ^(e) Risk*(e), 



//iere a; = Erflnv(y) stands for the inverse error function, i.e., y = -^== x 

Proof. Let G(-) be the density of the M(0,Il) distribution. By Theo- 
rem 3.2, we have Risk(g e ;e) < ^(ln^/e)), where, for r > 0, 

2$*(r) = max %.{x,y), 

x,yeX 

'k r {x,y)= inf \g T x-g T y 

+ a ^ n (^J exp{— a~ X (j) T uj}G{uj — Ax) duj^j 

+ In (^J exp{a" 1 4> t lo}G{uj - Ay) duj^j + 2r 

{T T T a i \ -1 1 ] 

5 x — g y + 4> A(y — x) + 2 a Voir > 
2 J J 

= mf{g T x - g T y + <f> T A(x - y) + 2V27||0|| 2 } 



g T x-g T y, \\A(x - y)\\ 2 < 2y/& , 
-oo, P(x-y)|| 2 >2 V / 27. 

Thus, 

(4.2) msk(g £ ;e) < *.(ln(2/e)) = \[g T x - g T y] 



for certain x, y & X with \\A(x — y) || 2 < 2-^/2 ln(2/e). It remains to prove that 

(4.3) Risk«(e) > ^(eji^ln^/e)). 

To this end, assume, on the contrary to what should be proved, that 

Risk*( £ ) < ^ l (e)^(H2/e)) (= ^-\e)\[g T x - g T y]), 

and let us lead this assumption to a contradiction. Under our assumption, 
there exists p < tp~ 1 (£)^[g T x — g T y], e' < £ and an estimate g such that 

(4.4) V(iel): Pmh{\g(Ax + £)-g T x\>p}<e'. 
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Observing that ip(s) > 1, we see that 2p < [g T x — g T y]. Let x = x and y be 
a convex combination of x and y such that 2p = [g T x — g T y] ■ Note that 

\\A(x-y)h < ^~ 1 {e)2^2\n{2/e) = 2erfinv(e). 

<j/>- 1 ( £ ) 

Now, let ITi be the hypothesis that the distribution of an observation (4.1) 
comes from x = x, and let LT2 be the hypothesis that this distribution comes 
from x = y. From (4.4) by the same standard argument as in the proof of 
Lemma 3.2, it follows that there exists a routine, based on a single observa- 
tion (4.1), for distinguishing between ITi and LT2, which rejects IT when this 
hypothesis is true with probability <e', i = 1,2. But, it is well known that 
the hypotheses on shifts of the standard Gaussian distribution indeed can be 
distinguished with the outlined reliability. This is possible if and only if the 
Euclidean distance between the corresponding shifts is at least 2erfinv(e'). 
This condition is not satisfied for our Ilj, i = 1, 2, which correspond to shifts 
Ax and Ay, since \\Ax — Ay\\2 < 2erfinv(e) < 2erfinv(e). We have arrived at 
a desired contradiction. □ 



\\A(x-y)h 



2 P 



\g T x - g T y] 



In fact, the reasoning can be slightly simplified and strengthened to yield 
the following result. 

Proposition 4.2. In the situation of Proposition J^.l, one can build 
efficiently an affine estimate g £ , such that 

. ErfInv(e/2) , , , 
0<e<l/2 Risk(ff £ ; e) < Risk* (e) 



fcf. Proposition 4.1, and note that ^S^f < Erflnv(ff ]• 
Proof. Let 

*( X) y - (j)) = g T x - g T y + <j) T A{y - x) + 2 erfinv(e/2) ||0|| 2 : {X x X) x M L — > M. 

clearly is a function which is continuous, convex in (j) and concave in (a;, y) 
on its domain; by the same argument as in the proof of Theorem 3.1, ^ has 
a well-defined saddle point value 

¥(0) V(x,y) 



2^^ (e) = inf max ^>(x, y; (f>) = max inf ^>(x, y\ 4>) . 

<j> x,y£X x,y£X <j> 

The function 

H(<j)) = max[5 r x-5 T y + (/> T (^-^x)] + 2erfinv(e/2)||^||2 > 2erfinv(e)||(; 

x,y£X 
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is a finite convex function on W L , which goes to oo as \\4>\\2 — > oo, and there- 
fore it attains its minimum at a point fa, so that 

2^*(e) = V(fa). 

Setting 



°*~2 



max[g x - fa Ax] - max [-g y + fa Ay] 



we have, similar to the proof of Lemma 3.1, the following: 



(a) max[g T x - (pi Ax — c*] + erfinv(e/2)||0*||2 = 

(b) max[-g T y + 4>lAx + c^] + erfmv(e/2)||<^|| 2 =#*(e). 

Now, consider the affine estimate 

g £ (u)) = <f%u; + c*. 

From (a) it follows that 

Vd> supProb{£f T x - <7 e (Ac + £) > d} < e' < e/2, 

while (b) implies that 

Vd > **(e): supProb{5 e (^2/ + £) - s T y > d} < e' < e/2. 

We conclude that Risk(g e ;e) < ^*(e). To complete the proof, it suffices to 
demonstrate that 

, . . . ErfInv(e/2) , . 

To this end, observe that 

*(ac,y) = [s T x - <? T y] +inf{^(y - x) + 2erfinv(e/2)|M| 2 } 

g T x-g T y, \\A(y - x)\\ 2 < 2erfmv(e/2), 
— oo, otherwise, 

whence 

Risk* (g e ; e) < * (e) = ± [/x - /y] , 

for certain x,y £ X , such that ||^4(x — y)||2 < 2erfinv(e). Relation (4.5) can 
be derived from this observation by exactly the same argument as used in 
the proof of Proposition 4.1 to derive (4.3) from (4.2). □ 



ESTIMATION BY CONVEX PROGRAMMING 



21 



5. Adaptive version of the estimate. In the situation of Problem I, let 
X 1 C X 2 C • • • C X K be a nested collection of nonempty convex compact 
sets in W 1 , such that A(X K ) C A4. Consider a modification of the problem 
where the signal x underlying our observation is known to belong to one of 
X k with value of k < K unknown in advance. Given a linear form g T z on 
W 1 , let Risk fc (g;e) and Risk^(e) be, respectively, the e-risk of an estimate g 
on X k and the minimax optimal e-risk of recovering g T x on X k . Let also 
3>*(r) be the function associated with X = X k according to (3.1). As it is 
immediately seen, the functions 3>*(r) grow with k. Our goal is to modify 
the estimate g yielded by Theorem 3.1 in such a way that the e-risk of the 
modified estimate on X k will be "nearly" Risk^(e) for every k<K. This 
goal can be achieved by a straightforward application of the well-known 
Lepskii's adaptation scheme [19, 20] as follows. 

Given 5 > 0, let 5' G (0,5), and let g k (-) be the affine estimate with the 
(e/K )-risk on X k not exceeding $^(ln(2if/e)) + 5' provided by Theorem 3.1 
as applied with e/K substituted for e and X k substituted for X. Then, for 
any k < K, 

sup Prob { . ) {\g k (u;)-g T x\ > $*(ln(2if/ e )) + 6} 

x&X k 

(5.1) 

< e'/K < e/K. 

Given observation uj, let us say that an index k < K is to -good, if for any 
k',k<k' < K, 

\g k '(uj) - g k (cu)\ < <^(ln(2A7e)) + $f (ln(2#/e)) + 25. 

Note that w-good indexes do exist (e.g., k = K). Given u>, we can find the 
smallest w-good index k = k(oj); our estimate is nothing but g(u>) = g ktyU) \oj). 



Proposition 5.1. Assume that e G (0, 1/4), and let 

ln(2A/ £ ) 

Then, for any (k,l < k < K), 

(5.2) sup Prob - g T x\ > 0<Z> k (ln(2K/e)) + 36} < e, 

x£X k 

whence also 

(5.3) V(k,l<k<K): Risk fc (g;e) < ^^SjT ^(e) +35. 
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Proof. Setting r = \u(2K/e), let us fix k < K and x G X k and call a 
realization cj x-good, if 

(5.4) V{k,k<k<K): \g k {oj) - g T x\ < $£(r) + 5. 
Since X k D AT fc when k>k, (5.1) implies that 

Prob^ PAw( .){ij is good} >l-e'. 

Now, when x is the signal and u is x-good, relations (5.4) imply that k is 
an w-good index, so that k(u) < k. Since k(u>) is an w-good index, we have 

\g(u) - g k {u)\ = \g k ^(u;) - g*{u)\ < $*(r) + **(r) + 25, 
which combines with (5.4) to imply that 

(5.5) \g(u) - g T x\ < 2^{r) + ^\r) + 35 < 3^(r) + 35, 

where the concluding inequality is due to k(ui) < k and to the fact that & k 
grows with k. The bound (5.5) holds true whenever uj is x-good, which, as 
we have seen, happens with probability > 1 — e'. Since e' < e and x G X k is 
arbitrary, we conclude that 

(5.6) Risk k (g;e)<3$ k (r) + 35. 

Using the nonnegativity and concavity of <£*(•) on the nonnegative ray and 
recalling the definition of r, we obtain 3>*(r) < ^^^& k (\n(2/e)) whenever 
e < 1/2 and K > 1. Recalling the definition of i?, the right-hand side in (5.6) 
therefore does not exceed i?$*(ln(2/e)) + 35. Since fc < K is arbitrary, we 
have proved (5.2). This bound, due to Lemma 3.2, implies (5.3). □ 
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